User Profiling based on Tweeter Data using
WordNet and News Paper Archive
Antara Pal
1
and Alok Ranjan Pal
2
1
Dept. of Computer Science and Engineering, Pailan College of Management and Technology, Joka, Kolkata-104,
West Bengal, India
2
Dept. of Computer Science and Engineering, College of Engg. and Mgmt., Kolaghat, 721171, West Bengal, India
E-mail: chhaandasik@gmail.com antarapal22@gmail.com
Abstract: In this paper, a method has been proposed for user
profiling based on tweeter data. The sentiments of the tweets
are retrieved programmatically with the help of WordNet
and News Paper Archive. In this experiment, the English
WordNet 2.1 has been used as an online semantic dictionary
and machine readable version of the “Times of India” news
paper has been used to generate a news paper archive. The
algorithm is tested on a data set of 1000 tweets from four
different categories which are initially tagged by their innate
senses for validation of the derived result.
First of all, the data set is evaluated with the help of
newspaper archive by using lexical overlap and the accuracy
in sense retrieval task is 48.7%. The reason behind this
scenario is the varieties of representations of a single
statement in natural language which creates a mare
similarity between the lexical entities of the statements. To
overcome this problem, the contexts of the tweets are
expanded with the help of WordNet by considering the
synonyms of every meaningful word of the tweets and after
that the senses of these tweets are evaluated. As the contexts
of the statements are expanded in this approach, semantic
relatedness between the statements is resolved in an efficient
way which leads the system towards a better performance.
Keywords— User Profiling; WordNet; Newspaper Archive;
Tweeter Data; Synset Analysis
I. INTRODUCTION
User Profiling is one of the major demands of current era.
It is used for several purposes, like- a) personalized
recommendation where advertisement of selective
products is presented to a user based on his/her activities
on internet, b) sentiment analysis of a person whose state
of mind may have an impact on his/her surroundings, c)
sentiment of a community which could be used for any
administrative decision making, etc.
In this experiment, user profiling is performed based on a
user’s tweets. The overall experiment is carried out in two
phases. First, the sentiments of the tweets are derived
based on lexical similarity considering the news paper
archive as a reference. As the information is stored in the
news paper archive in a categorized manner, this
knowledge base has been used as a reference for sense
resolution. But, the performance of the system was not
too much appreciable in this phase. The reason, observed
that natural language is creative and every individual
expresses his/her views in different ways. So, establishing
a relation between a pair of statements based on only
lexical similarity is not a wise move. So, in the next step,
semantic relation between the statements is calculated
beyond the lexical similarity. To do this task, contexts of
the tweets are expanded by synset analysis of the
meaningful words of the tweets with the help of WordNet.
As the contexts of the tweets are expanded, the semantic
relation between a pair of statements is identified in a
better way which leads the system towards a better
accuracy.
In this experiment, English WordNet 2.1 has been used as
an online semantic dictionary and the news paper archive
is prepared from the online version of the “Times of India”
news paper.
II. SURVEY
Zhongqi Lu et al. [1] proposed a Collaborative Evolution
model, which learns the evolution of user’s profiles
through the sparse historical data in recommender
systems and outputs the prospective user profile for the
future. To verify the effectiveness of the proposed model,
the authors conduct experiments on a real-world dataset
which is obtained from the online shopping website-
www.51buy.com and contains more than 1 million users’
shopping records in a time span of more than 180 days.
O. Hasan et al. [2] proposed a work on user profiling with
big data techniques and the associated privacy challenges.
The authors also discussed the ongoing EU-funded
EEXCESS project as a concrete example of constructing
user profiles with big data techniques and the approaches
being considered for preserving user privacy.
Grcar Miha et al. [3] proposed a work to address the
problem of personalized information delivery related to
the Web that is based on user profiling. The authors have
analyzed different approaches to user profiling, like-
content based filtering, collaborative filtering and Web
usage mining. They have presented an overview of the
approaches including recent research results in the area
with especial emphases on user profiling in the
perspective of Semantic Web applications.
2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) | 978-1-7281-5791-7/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICAECT49130.2021.9392398
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on July 19,2023 at 08:25:41 UTC from IEEE Xplore. Restrictions apply.